Test

Title

Random Rounding

Algorithm

  • Solve \(\operatorname{SDP} (\boldsymbol{W})\) and obtain \(\hat{\boldsymbol{X}}\)

  • Decompose \(\hat{\boldsymbol{X}} = \hat{\boldsymbol{V}} ^{\top} \boldsymbol{\hat{V}}\), e.g. using EVD \(\boldsymbol{\hat{X}} ^{\top} = \boldsymbol{\hat{U}} \sqrt{\boldsymbol{\hat{\Lambda}}}\), or using Cholesky. Note that \(\left\| \boldsymbol{\hat{v}} _i \right\| =1\) always holds, due to the constraint \(\hat{X}_{ii}=1\).

  • Sample a direction \(\boldsymbol{r}\) uniformly from \(S^{p-1}\)

  • Return binary partition assignment \(\hat{\boldsymbol{x}} = \operatorname{sign} (\boldsymbol{\hat{V}} ^{\top} \boldsymbol{r} )\)

Intuition

  • Randomly sample a hyperplane in \(\mathbb{R} ^n\) characterized by vector \(\boldsymbol{r}\). If \(\hat{\boldsymbol{v}} _i\) lies on the same side of the hyperplane with \(\boldsymbol{r}\), then set \(\hat{x}_i =1\), else \(\hat{x}_i = -1\).

  • If there indeed exists a partition \(I\) and \(J\) of vertices characterizedd by \(\boldsymbol{x}\), then the two groups of directions \(\boldsymbol{v} _i\)’s and \(\boldsymbol{v} _j\)’s should point to opposite direction since \(\boldsymbol{v} _i ^{\top} \boldsymbol{v} _j = x_i x_j = -1\). After random rounding, they should be well separated. Hence, if \(\hat{\boldsymbol{v}}_i ^{\top} \hat{\boldsymbol{v} }_j\) recovers \(\boldsymbol{v}_i ^{* \top} \boldsymbol{v}^* _j\) well enough, then \(\hat{\boldsymbol{x}}\) well recovers \(\boldsymbol{x}^*\), the optimal max-cut in \(\operatorname{cut}(\boldsymbol{W})\).

\[\begin{split}\begin{aligned} \boldsymbol{X} ^* \text{ to } \operatorname{cut} \ & \Leftrightarrow \quad \boldsymbol{x} ^* \text{ to } \operatorname{cut}\\ \text{SDP relaxation}\quad \downarrow \quad &\qquad \qquad \uparrow \text{recover} \\ \boldsymbol{\hat{X}} \text{ to } \operatorname{SDP} & \xrightarrow[\text{rounding} ]{\text{random} } \hat{\boldsymbol{x}} = \operatorname{sign} (\boldsymbol{\hat{V}} ^{\top} \boldsymbol{r} )\\ \end{aligned}\end{split}\]

To see how \(\boldsymbol{V}\) looks like for \(\boldsymbol{x} \in \left\{ \pm 1 \right\} ^n\), let \(\boldsymbol{x} = [1, 1, -1, -1, -1]\), then

\[\begin{split} \boldsymbol{X} = \left[\begin{array}{rrrrr} 1 & 1 & -1 & -1 & -1 \\ 1 & 1 & -1 & -1 & -1 \\ -1 & -1 & 1 & 1 & 1 \\ -1 & -1 & 1 & 1 & 1 \\ -1 & -1 & 1 & 1 & 1 \end{array}\right],\qquad \boldsymbol{V} = \left[ \begin{array}{rrrrr} -1 & -1 & 1 & 1 & 1 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \end{array} \right] \end{split}\]

Analysis

How well the algorithm does, in expectation? We define Geomans-Williams quantity, which is the expected cut value returned by the algorithm, where randomness comes from random direction \(\boldsymbol{r}\).

\[\operatorname{GW} (\boldsymbol{W}) = \mathbb{E}_{\boldsymbol{r}} [f_{\boldsymbol{W} }(\hat{\boldsymbol{x}}) ] = \mathbb{E}_{\boldsymbol{r}} \left[ \frac{1}{2} \sum_{i,j=1}^n w_{ij} (1 - \hat{x}_i \hat{x}_j)\right]\]

Obviously \(\operatorname{GW} (\boldsymbol{W}) \le \operatorname{cut} (\boldsymbol{W})\), since we are averaging the value of feasible solutions \(\hat{\boldsymbol{x}}\) in \(\left\{ \pm 1\right\}^n\), and each of them is \(\le \operatorname{cut} (\boldsymbol{W})\). But how small can \(\operatorname{GW} (\boldsymbol{W})\) be?

It can be shown that \(\operatorname{GW}(\boldsymbol{W}) \ge \alpha \operatorname{cut}(\boldsymbol{W})\) where \(\alpha \approx 0.87\). That is, the random rounding algorithm return a cut value not too small than the optimal value, in expectation.

Test plot

import plotly.io as pio
import plotly.express as px
import plotly.offline as py
pio.renderers.default = "notebook"
df = px.data.iris()
fig = px.scatter(df, x="sepal_width", y="sepal_length", color="species", size="sepal_length")
fig.show()
import numpy as np
import plotly.express as px
import plotly.io as pio
import plotly.offline as py

pio.renderers.default = "notebook"

v = np.array([[1,0,3], [-1,0,3]])
v = v/np.linalg.norm(v, 2, axis=1)[:, np.newaxis]
n = 3000
np.random.seed(1)
x = np.random.normal(size=(n,3))
x = x / np.linalg.norm(x, 2, axis=1)[:, np.newaxis]
x = x[np.dot(x,v[0])*np.dot(x,v[1]) <0, :]
print(f'v1 = {np.round(v[0],3)}, \nv2 = {np.round(v[1],3)}')
print(f'arccos(v1,v2)/pi = {np.round(np.arccos(v[0] @ v[1].T)/np.pi,3)}')
print(f'simulated result = {np.round(len(x)/n,3)}')
fig = px.scatter_3d(x=x[:,0], y=x[:,1], z=x[:,2], size=np.ones(len(x)), range_z=[-1,1])
fig.add_scatter3d(x=[0, v[0,0]], y=[0, v[0,1]], z=[0, v[0,2]], name='v1')
fig.add_scatter3d(x=[0, v[1,0]], y=[0, v[1,1]], z=[0, v[1,2]], name='v2')
fig.show()
v1 = [0.316 0.    0.949], 
v2 = [-0.316  0.     0.949]
arccos(v1,v2)/pi = 0.205
simulated result = 0.208

Besides, how large can the SDP relaxation value \(\operatorname{SDP} (\boldsymbol{W})\) be? Grothendieck’s Inequality says

\[ K \operatorname{cut}(\boldsymbol{W}) \ge \operatorname{SDP}(\boldsymbol{W}) \]

where \(K \approx 1.7\). Hence, the SDP relaxation \(\Omega_{SDP}\) does not relax the original domain \(\Omega\) too much (otherwise we may see \(\operatorname{SDP}(\boldsymbol{W}) \gg \operatorname{cut}(\boldsymbol{W})\)). Hence \(\hat{\boldsymbol{v}}_i ^{\top} \boldsymbol{\hat{v}} _j\) should recover binary \(x_i^* x_j^*\) well.

For SBM

The above inequalities applies to any problem instance \(G=(V, E, \boldsymbol{W})\). It may give too generous or useless guarantee for some particular model. Let’s see its performance in stochastic block models.

We work with the mean-shifted matrix \(\boldsymbol{B} = 2\boldsymbol{A} - \boldsymbol{1} \boldsymbol{1} ^{\top}\), where

\[\begin{split} b_{ij} = \left\{\begin{array}{ll} 1, & \text { if } a_{ij}=1 \\ -1, & \text { if } a_{ij}=0 \end{array}\right. \end{split}\]

Essentially, \(\boldsymbol{B}\) just re-codes the connectivity in \(G\) from 1/0 to 1/-1. Note that in SBM, \(\boldsymbol{B}\) is random, depending on parameters \(p\) and \(q\). In the perfect case, if \(p=1, q=0\), then we can tell cluster label \(\boldsymbol{x}\in \left\{ \pm 1 \right\}^n\) directly from \(\boldsymbol{B}\), which can be expressed exactly as \(\boldsymbol{B} = \boldsymbol{x} \boldsymbol{x} ^{\top}\). In general cases, \(\boldsymbol{B}\) cannot be expressed as \(\boldsymbol{x} \boldsymbol{x} ^{\top}\) for some \(\boldsymbol{x} \in \left\{ \pm 1 \right\}^n\). We in turn want to find some \(\boldsymbol{X} = \boldsymbol{x} \boldsymbol{x} ^{\top}\) that is close enough to \(\boldsymbol{B}\), and then use \(\boldsymbol{x}\) as the approximated cluster label.

Similar to the max-cut case, we apply SDP relaxation that drops the rank-1 constraint to \(\boldsymbol{X}\). The SDP problem is then

\[ \max\ \operatorname{tr}\left( \boldsymbol{B} \boldsymbol{X} \right) \qquad \text{s.t. } \boldsymbol{X} \succeq 0, X_{ii}=1 \]

Note \(\operatorname{tr}\left( \boldsymbol{B} \boldsymbol{X} \right) = \langle \boldsymbol{B} , \boldsymbol{X} \rangle = \sum_{i,j}^n b_{ij} x_{ij}\). Next, we show that the solution to the above problem \(\hat{\boldsymbol{X}}\) is exactly rank-1, even we’ve dropped the rank-1 constraint.

.

.

.

.

.

.

.

.